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APPLICATIONS OF PARALLEL PROCESS HIMAP FOR LARGE SCALE MULTIDISCIPLINARY 

PROBLEMS 

Guru P. Guruswamy, Mark Potsdam and David Rodriguez 
NAS Division, Information Systems Directorate 
Ames Research Center, Moffett Field CA 94035. 
gguruswamy@mail.arc.nasa.gov 

Since the initiation of the HPCC Project at NASA in the early 90s, several computational tools have 
been developed under the computational aerosciences(CAS) area[l]. One such tool is HiMAP, a 
supermodular, 3-level parallel, portable high fidelity, tightly coupled analysis process[2,3]. This process 
is designed using the state-of-the-art information technology tools such as MPIRUN[4], an efficient 
protocol to allow groups of processors to communicate with other groups of processors for 
mutltidisciplinary type computations. MPIRUN is based on IEEE standard Message Passing Interface 
(MPI)[5] that is currently supported by most major computer vendors. 

The modularity in HiMAP is based on the function of the individual discipline module. In general, an 
analysis process in HiMAP is divided into independent and dependent modules. Independent modules 
are those dealing with specific physics and can be used in stand alone mode as a single discipline code. 
For example, the fluids module is an independent module. Dependent modules are usually required for 
multidisciplinary computations .They depend on more than one module for their function. A typical 
example of a dependent module is thermal loads module in aero-theromoelastic computations. This 
module requires temperature data from the fluids module and also structural data from the structures 
module to convert temperature to thermal loads. Figure 1 shows a typical process chart for HiMAP. 

The three level (intra-discipline, inter-discipline and multiple run) parallel capability is built into 
HiMAP using MPI and MPIRUN protocols. On a typical massively parallel super-computer, a set of 
processors are assigned to each discipline as needed. Communication within each discipline is 
accomplished using MPI. Between disciplines and cases communications are achieved using MPIRUN. 
Figure 2 shows the multi level parallel capability of HiMAP. 

The hybrid coarse-fine grain parallelization achieves the goal of load-balanced execution provided that 
there are enough processors available to handle the total number of blocks. On the other hand, the 
load-balancing does not guarantee the efficient use of the computational nodes. The computational 
nodes might be working with less than the optimal computational load and performing a lot of expensive 
inter-processor communications, hence data-starved. Both problems are alleviated by introducing 
domain-coalescing capability to the parallelization scheme. In domain coalescing, a number of blocks 
are assigned to a single processors resulting in economy in number of the computational resources and 
also a more favorable communications-to-computations ratio during the execution. This process which 
is illustrated in Fig. 3 is described in Ref. 6 that was presented at SC97. 

HiMAP is suitable for large scale multidiscipinary analyses. It incorporates Euler/Navier-Stokes based 
flow solvers such as ENSAER0[7], USM3D[8] and finite element based structures solvers such as 
NASTRAN[9], To-date HiMAP has been demonstrated for large scale aeroelastic applications that 
required 16 million fluid grid points and 20,000 structural finite elements. Cases have been 
demonstrated using up to 228 nodes on IBM SP2 and 256 nodes on SGI 0rigin2000 computers. Typical 
configurations analyzed are full subsonic and supersonic aircraft. 

Figure 4 shows the 34 block grid for the sub-scale wind tunnel model of an L 101 1 transport 


aircraft[10,ll]. The total grid size is 9M points. Due to geometric complexity grid sizes vary 
significantly from block to block. The ratio of smallest to largest grid size is 0.07. This distribution can 
lead to inefficient computations on MPP. An algorithm based on node filling is currenly incorporated in 
HiMAP. This algorithm assigns multiple grids to a single processor. Using the node filling algorithm 
that will be explained in detail in the full paper, the 34 block grid was mapped on to 24 Origin-2000 
processors. Figure 5 shows the original grid distribution by assigning one block per processor and 
modified block distribution by using the node-filling algorithm. The computational efficiency is 
increased by 30% with the new distribution. Further research is ongoing in using neural networks 
approach for load balancing. 

Fig 6 shows one of the 5 structural modes from the finite element computations. One 02000 node was 
assigned to the modal data. Computations were made using G03D (Gooijian-Obayashi Streamwise 
Upwind Algorithm) fluids solverf 1 2] and ENSAERO modal solver[13] along with 
MBMG(MultiBlockMovingGrid) [11,14] moving grid module available in HiMAP process. A typical 
aeroelastic solution is shown in Fig 7. The stability and convergence of G03D algorithm was not 
affected by re-distribution of patched grids to different processors. Figure 8 shows a plot of current grid 
size versus grid blocks. This grid is for a complex state-of-the-art aircraft. 

HiMAP is multi-platform middleware. It has been tested on IBMSP2, SUN HPC6000 and SGI 02000 
systems. Scalable performance and portability is shown in Fig 9. Work is in progress to port HiMAP to 
other latest platforms such as SGI 03000. 

Current effort is in progress to apply HiMAP for aeroelastic computations of more complex 
configurations than presented in this paper. Future work involves efforts to map HiMAP on to PSE 
(Problem Solving Environment) and IPG (Information Power Grid) tools. 
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TYPICAL ANALYSIS PROCESS IN HIMAP 
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Fig. 1 Parallel Process 





Fig. 2 Multilevel process 



Fig. 3 Load balacing approach in HiMAP 









MULTI BLOCK LOAD BALANCING 



Fig. 5 Results load balancing 




L1011 mode 2 



Fig.6 Modal data from Finite element analysis 



Fig.7 Typical aeroelastic results for L101 1 sub-scale wind tunnel model 






COUPLED COMPUTATIONS 



MULTI-BLOCK GRID FOR FLOW SOLVER 

L101 1 WIND TUNNEL MODEL 
9 MILLION GRID POINTS, 34 BLOCKS 






RESULTS FROM NODE FILLING ALGORITHM IN HIMAP 



LI Oil WIND TUNNEL MODEL , 9 MILLION GRID POINTS 
NUMBER OF PROCESSORS REDUCED TO 28 FROM 34 




MODAL STRUCTURAL DATA FOR LI Oil 

12 MODES WITH 2100 DEGREES OF FREEDOM PER MODE 
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DEMONSTRATION OF CODE MODULARITY 



Paragon Research 


MODULARITY IN FLUID/STRUCTURAL SOLVERS 

• ZONAL PATCHED FLOW GRIDS (ENSAERO, TLS3D, CFL3D, USM3D TYPE) 
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SCALABLE PERFORMANCE 

777 AIRCRAFT(W-B), 2.6 M GRID POINTS ON SGI 02000 



NOTE : 7.5 M GRID POINTS CASE RUNS AT 4.3 GFLOP USING 60 PROCESSORS 
(USING R10000 CONFIGURATION WITHOUT OPTIMIZATION) 



APPLICATIONS OF PARALLEL PROCESS HIMAP 
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AEROSPACE TECHNOLOGY NEEDS ARE RAPIDLY CHANGING 



- SEAMLESS INTEGRATION AMONG DISCIPLINES 

- PORTABLE AND SCALABLE WITH HARDWARE (MPP) 

- LEAD TO PAPERLESS DESIGN 

-CAPABLE OF ‘TIGHTLY’ AND “ OOSF i V COUPLED SIMULATION 
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SOFTWARE MODULARITY 
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HARDWARE MODULARITY 



CAN BE BUILT UP TO 512 
PROCESSORS IN UNITS 
OF 8 NODES 


HIMAP VISION CHART 










THREE LEVEL PARALLEL PROCESS 

USING MPIRUN 



SEPARATE 

EXECUTABLES 


WHAT AND WHY IS MPIRUN ? 
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MULTIZONAL DATA COMMUNICATION PROCESS IN HIMAP 
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rank node rank 







MULTIDISCIPLINE DATA COMMUNICATION IN HIMAP 



global processor local 

rank node rank 












CURRENT ANALYSIS CAPABILITY OF HIMAP 



■ INDEPENDENT MODULES 

■ DEPENDENT MODULES 




An Intercube Communication Between Fluid and Structural Domains 




EXAMPLE FOR UNCOUPLED COMPUTATIONS 



NOTE : 7 GFLOP PERFORMANCE ON 160 NODES ON IBM SP2 

SUITABLE LOW- FIDELITY COMPUTATIONS TO FILL DESIGN SPACE 









PORTABILITY AND PERFORMANCE OF HiMAP 
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CONCLUSIONS AND FUTURE PLANS 



• CONTINUE DEVELOPMENT WITH HPCC AND IT PROGRAMS AND 
APPLY FOR SPACE AND DOD (UCAV, AWS, ASCI/FWV) PROJECTS 



